Random modification effect in the size of the fluctuation of 
the LCS of two sequences of i.i.d. blocks 



The problem of the order of the fluctuation of the Longest Common Subsequence (LCS) of 
two independent sequences has been open for decades. There exist contradicting conjectures 
on the topic, [T] and [2]. In the present article, we consider a special model of i.i.d. sequences 
made out of blocks. A block is a contiguous substring consisting only of one type of symbol. 
Our model allows only three possible block lengths, each been equiprobable picked up. For 
i.i.d. sequences with equiprobable symbols, the blocks are independent of each other. For 
this model, we introduce a random operation (random modification) on the blocks of one 
of the sequences. In this article, for our block model, we show the techniques to prove the 
following: if we suppose that the random modification increases the length of the LCS with 
high probability, then the order of the fluctuation of the LCS is as conjectured by Waterman 
[2]. This result is a key technical part in the study of the size of the fluctuation of the LCS 
for sequences of i.i.d. blocks, developed in [3|. 

1 Model and main results 

In general trough this paper, X and Y will denoted two finite strings over a finite alphabet 
S. A common subsequence of X and y is a subsequence which is a subsequence of X as well 
as of Y. A Longest Common Subsequence of X and Y (denoted simply by LCS of X and Y, 
or only LCS when the context is clear enough) is a common subsequence of X and Y of max- 
imal length. For a motivation on why to study the LCS problem, the reader can look at [3ll3]. 

Let / > be an integer parameter. Let Bxi, Bx2, ■ ■ ■ and Byi, By2, ■ ■ ■ be two i.i.d. se- 
quences independent of each other such that: 
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We call the runs of O's and I's blocks. Let = X1X2X3 ... be the binary sequence so 
that the i-th block has length Bxi where Xi is choosen with probability 1/2 or 1 with 
probability 1/2. Similarly let Y°° = Y1Y2Y3 ... be the binary sequence so that the i-th block 
has length Byi and Yi is choosen with probability 1/2 or 1 with probability 1/2. 

Example 1.1 Assume that Xi = and Bxi = 3, Bx2 = 4 and Bxs = 2. Then we have 
that the sequence X°° starts as follows X°^ = 000111100- • • meaning that in the first 
block consists of three O's, the second block consists of four I's, the third block consists of two 
O's, etc. 

Let X denote the sequence obtained by only taking the first n bits of namely X = 
X1X2X3 . . . Xn and similarly Y = Y1Y2Y3 . . . Y^. Let L„ denote the length of the LCS of X 
and y, Ln := \LCS{X,Y)\. 

The main result of [3l Hj states that for / large enough, the order of the fluctuation of L„ is 
n: 

Theorem 1.1 There exists Iq so that for all I > Iq we have that: 



for n large enough. 

In [3l S] the authors showed that theorem |1.1| is equivalent to proving that "a certain random 
modification has a biased effect on Ln" ■ This is a technique with similar approches in other 
papers (for instance see [5], [6]). So the main difficulty is actually proving that the random 
modification has typically a biased effect on the LCS, which for the block model is connected 
to a constrained optimization problem [3l Uj. This random modification is performed as 
follows: we choose at random in X a block of length I — 1 and at random one block of length 
I + 1, this means that all the blocks in X of length I — 1 have the same probability to be 
chosen and then we pick one of those blocks of length / — 1 up and also that all the blocks 
in X of length / + 1 have the same probability to be chosen and we pick one of those blocks 
of length I + 1 up. Then we change the length of both these blocks to /. The resulting new 
sequence is denoted by X. Let Ln denote the length of the LCS after our modification of X. 
Hence: 



If we can prove that our block length changing operation has typically a biased effect on the 
LCS than the order of the fluctuation of L„ is ^/n. This is the content of the next theorem: 

Theorem 1.2 Assume that there exists e > and a > not depending on n such that for 
all n large enough we have: 



VAR[L„] = Q{n) 



Ln := |LCS(X,y)|. 




(1.1) 



Then. 



VAR[L„] = e(n) 



for n large enough. 
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The above theorem reduces the problem of the order of fluctuation to proving that our ran- 
dom modification has typically a higher probability to lead to an increase than to a decrease 
in score. The main result of this article is theorem 11.21 



A very useful tool we often use is the Azuma-Hoeffding theorem. The following is a ver- 
sion of it for martingales (for a proof see [7]): 

Theorem 1.3 (Hoeffding's inequality) Let (V,^) be a martingale, and suppose that there 
exists a sequence ai, a2, • • • of real numbers such that 

Fi\Vn-Vn-l\ < 0„) = 1 

for all n. Then: 

niVn - Vol > ^) < 2exp { - ^v'/Y^a^} (1-2) 

i=l 

for every v > 0. 

We also will use a corollary of the above theorem, for some intermediate bounds: 

Corollary 1.1 Let a > be constant and Vi,V2, ■ ■ ■ be an i.i.d sequence of random bounded 
variables such that: 

V{\Vi-Y.[Vi\\ <a) = l 
for every i = 1,2,... Then for every A > 0, we have that: 



Vl + --- + Vn 



n 



2 



> A) <2exp(-^-n) (1.3) 



2 Random modification effect in the fiuctuation 



We are going to prove theorem 1.2 which states that VAR[L„] = Q{n) holds if there exist 



e, a > not depending on n such that: 

P { E[L„ - Ln\X, Y]>e ) > 1 - exp(-n"). (2.1) 



for all n large enough. We have omitted some of the proofs for shortness reasons, but all the 
details can be looked at [4j. 

Note that if 2 is a random variable with VAR[2] = 0(n) and / is a map which tends 
to increase linearly, then for W = f{2), we also have the order VAR[W] = 0(n). The map 
/ can be even a random map but must be independent of Z. The exact basic result ([6], 
lemma 3.2) goes as follows: 

Lemma 2.1 Let c > be a constant. Assume that g -.M ^ M is a map which is everywhere 
differentiable and such that for all x (zM we have: 

dgjx) ^ ^ 
dx ~ 
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Let B be a random variable such that E[\g{B)\] < +00. Then: 

YAR[g{B)] > c^-YAR[B]. 
In the present context, we need a slightly different version: 

Lemma 2.2 Let e, m > be constants and f : Z ^ X be a map such that for all z\ < Z2 the 
following two conditions hold: 

Z2-zi>m^ f{z2) - f{zi) > ^{Z2 - zi) (2.2) 

o 

3/3>0: Z2-zi<m^ /(zs) - /(^i) < /3(z2 - ^1) (2.3) 
Let B be a random variable such that E[|/(S)|] < +00. Then: 

VAR[(/(B)1 > g (1 - le^^^g^) VARIiJ] (24) 

Proof. Let /i : Z — t- Z be a map defined from / as follows: for a given z G Z choose k > 2 
such that z G [km, {k + l)m] and compute 

, ff{{k + l)m)-f{km)\ , , ^ X 

/i(z) = — — — (z - km) + f(km) 

\ m J 

then h{z) is just the linear interpolation of f{z) in [km, (k + l)m]. It is easy to see that h 



satisfies the conditions of lemma 2.1 for c = e/8. Then: 

£2 

YAR[h{B)] > — VAR[B] (2.5) 
We want to estimate the distance between the random variables h{B) and f{B). First, we note 



that from 2.2 and by the definition of h, the following inequalities hold for km < B < {k+l)m: 



-{B- km) + f{km) < f{B),h{B) <-{B-{k + l)m) + f{{k + l)m) 
8 8 



looking at conditions 2.2 2.3 and the inequalities above we get 



\h{B)-f{B)\ < \-{B-km) + f{km)--{B-{k + l)m) + f{{k + l)m) 
8 8 

< ^m+\f{{k + l)m) - f{km)\ 
8 



and by using the last inequality above: 



VAR[/(i?)-/i(i?)] < (J + /3)"m2. (2.6) 
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Since f{B) = h{B) + {f{B) — h{B)) we can apply triangular inequality and find: 
yVAR[7CB)I > ^/\AR[h{B)\ - ^VAR[/(i3) - h{B)\, 

hence we have: 

VAR[(/(B)] > VAR[/i(S)] - 2VVAR[/i(B)] • ^J\KR[f{B) - h{B)] 

^y/YAR[f{B)-h{B)]\ 



YAR[h{B)] 1-2- 



^YAR[h{B)] 



Finally, applying the inequalities 2.5 and 2.6 to the last inequality above, we get: 



Hence to prove that VAR[Ln] = Q{n), we try to represent Ln as f{2) where / is a random 
map which tends to increase linearly on a certain scale and Z is a random variable having 
fluctuation of order ^/n. 

2.1 Random modifications and the variables {T,Z,R) 

Let A'^; denote the number of blocks in X of length /, whilst Ni_i, resp. Nij^i denote the 
number of blocks of length I — 1, resp I + 1 in A". Let us define the following three random 
variables: 



T 
Z 
R 



Ni + Ni^i + Ni+i (2.7) 
Ni - Ni_, - A,+i (2.8) 
n-{lNi + {l + l) Ni+i + Ni^i ) (2.9) 



Note that when we know the values of (T, Z, R) we can determine the values of Ni-i,Ni and 
Ni^i as a linear function by using the definitions of T, Z and R as follows: 

Ni.,{T,Z,R) \ ( (2/ + l)/4 -1/4 \ / ^ X / -{n-R)/2 \ 

Ni{T,Z,R) = 1/2 1/2 + (2.10) 

Ni+,{T,Z,R) J V-(2/-l)/4 -1/4/ V {n-R)/2 J 

The variable R represents what is left in X after the last block of length / — 1 , / or / + 1 . 

Example 2.1 Let us consider the sequence X = 000111100011001 for I = 3 and n = 15. We 
see that Ni^i = 2, Ni = 2 and Ni-^^i = 1, hence T = 5, Z = —1 and R = 1. Also, the block 1 
at the end of X has length strictly smaller than I — 1 which also means that R= 1. In this 
case is easy to interpret what R is since the last block in X has length strictly less than I — 1. 
Let us see a different situation. Let us take again I = 3 and now consider Bxi = 2, Bx2 = 
3, Bx3 = 4, Bx4 = 3, Bx5 = 2, Bxe = 4, . . . such that X°° = 001110000111001111 • • • Take 
n = 16 so that X = 0011100001110011. Here the last block of X has length 1-1 = 2 which 
should imply (using the point of view of the last situation) that R = 0. But, notice that the 
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block in X°° corresponding to Bxe was cut when we took X . In this case, we say that the last 
block in X corresponds to the rest so R = 2 and therefore Ni^i = 2, Ni = 2 and Ni^i = 1, 



then T = 5 and Z = —1. We take this convention on R, even if the definition 2.9 is not the 
exact one, because of the simplifications later during the computation of the joint distribution 
ofNi.i,Ni,Ni+i. 

Let us roughly explain the main idea behind this subsection. Assume that we have a random 
couple {V, W) which can take on a finite number of values only. We also assume the joint 
distribution C{V,W) to be given. To simulate {V,W), we could first simulate V using the 
marginal law C{V). We would obtain a numeric value vq. Then, we could simulate W using 
the conditional law = vq) and obtain the numeric value wo- The couple {vo,wo) 

has joint distribution CiV, W). Another less efficient possibility is to simulate for each (non- 
random) value V that V can take, a value for W with distribution = v). Call the 
numeric value w{v). Then, we would simulate V with distribution C{V) and obtain a numeric 
value t^o- Then, for W we would take among all the values which we have simulated, the one 
corresponding toV = vq. In this manner, we get (vq, w{vo)). This couple has the distribution 
C{V, W) and this does not even require that we simulate the different w{vys independently 
of each other. Only, V needs to be simulated independently of the assignment v i— >• w{v). 

We are going to do the above simulation scheme with V being (T, Z, R) and W being the 
rest of the information in {X,Y). More precisely, for all possible {t,z,r) non-random values, 
we simulate X conditional on {T,Z,R) = {t,z,r). The resulting string is denoted by 2 
and has thus distribution 

£(X(i,,,,)) = £(X| (r,Z,i?) = (t,z,r)). 

Let Ln{t, z, r) denote the length of the LCS 

Ln{t,z,r) := |LCS(X(i,,,,),y)|. 

We assume that the simulation of the string ^.-j is done independently of (T, R, Z) and 
of Y. In this manner, we get that Ln(T, Z, R) has same distribution as L„ = |LCS(X, 
So to prove that VAR[L„] = 0(n), it is enough to prove that 

VAR[L„(r, Z, R)] = e(n). (2.11) 



We saw at the beginning of this section (see lemma 2.1 and 2.2), that when we transform 
a variable having variance of order 0(n) with a map which tends to increase linearly, then 
the resulting variable has variance of order Q{n). It is easy to see that VAR[Z] = Q(n) (see 



also lemma 2.8). Hence to prove 2.11, it is enough to show that with high probability the 
(random) map 

z I— Ln{T, z, R) 

tends to increase linearly (on the appropriate scale and on a domain on which Z typically 
takes its value). That means, we need to show that we can simulate the values Ln{t,z,r) in 
such a manner to get the desired distribution C{X\{T, Z, R) = {t, z, r)) as well as the desired 
linear increase of the map z ^ LniT, z, R). This is achieved by simulating ^(t^^ j,) in the 
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following way: for a given value {t,r), so that P((T, ii) = {t,r)) ^ 0, we take a left most 
(left most to be defined later) value zq and simulate a string with distribution equal to the 
conditional distribution of X given (T, Z, R) = (t, zq, r). That resulting string is denoted by 
■^{t,zo,r)- Then, we apply the random modification to ^o.r)- This means, we choose one 
block of length / — 1 and one block of length / + 1 at random in ,,-) and turn them both 

into length /. The resulting string is denoted by ^(f.zg4.4,r)- Then, we choose at random in 
2o+4,r) a- block of lenght I — 1 and a block of lenght / + 1 and turn them both into length /. 
The new string which we obtain in this manner is denoted by j,g_(_8 .^-j . We keep repeating 
this same operation to obtain the sequence of strings 

X{t,zo,r)^ ^{t,zo+4.,r): ^{t,ZQ+8,r)^ (2.12) 

For each value of {t,r) with P((T, i?) = {t,r)) 7^ we obtain two finite sequences of strings: 
first [2J2] and then 

^{t,zo+2,r)7 ^{t,zo+6,r): ^ {t,zo+10,r) : • • • • 

by a similar procedure. Namely, after X(^i^zo+2,r) is generated with distribution X conditional 
on (T,Z,R) = {t,z + 2,R), the subsequent strings are obtained by applying sucessively the 
random modification tilde, which chooses at random in the string a block of length / — 1 and 
a block of length I + 1 and turn them both into length /. 



Recall that in this section we assume that our random modification has a biased effect of 
e > on the LCS, so that with high probability 

E[Ln-Ln I X,Y]>e. 

Hence, it follows that the map z 1— )• Ln(T, z, R) tends with high probability to increase with 
slope close to e on a constant time scale In re (the constant must be taken large enough though. 



see lemma 2.6 and proposition 2.2). In other words, since the random modification has a bi- 
ased positive effect, the map z 1— t- Ln{T, z, R) behaves like a random walk with drift e. The 
only thing which remains to be proved is that with our scheme of using the random modifi- 
cation, the strings 2,r) have the right distribution, i.e. the distribution of X conditional 



on (T, Z, R) = {t, z, r). This is proved in lemma 2.5 



We have so far summarized the idea which explains why the biased effect of the random 
modification implies VAR[L„] = Q{n). There is one more detail which we should mention 
and which makes notations a little more difficult. To prove that z 1— )• Ln{T, z, R) tends to 
increase linearly we use the biased effect on the LCS for the random modification. How- 
ever, this bias holds with high probability for X and not for X^^^z^J.y When we look at the 
conditional distribution of X given (T, Z, R) = {t, z, r), we divide by the probability 

P{{T,Z,R) = {t,z,r)). (2.13) 

The string .^^j has distribution of X conditional on (T, Z, R) = (t, z, r). So for the biased 
effect to have large probability also for X(^f^z,r) (and not just for X), we need the probability 
|2.13| to not be too small. To assure this, we will restrict ourselves to "typical" values for 
(T, Z, R) . We will consider only values for (T, Z) which lie in an interval D = Dt x Dz (see 
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definition below 2.16) and prove that any possible value {t,z) G Dz x Dt has polynomially 



bounded probability (see lemma 2.4). 
Let us now give all the details: 

Proposition 2.1 Given e > there exist constants 1 < A;i,/c2,A;3 < k* all not depending on 
n hut on e such that: 



n 



31 



n 



< ^2 , P 



n 



< ^3 > 1 - e (2.14) 



for every n large enough. 



We will need later the following lemma: 

Lemma 2.3 There exists c > not depending on n such that: 



p r G 



— — c\Jn^ — + c\/n 



n /— n ^ 
L -^^'"3/+^^. 



> 0.9 



(2.15) 



Let D denote the domain 
D : 

and let 



n I— I— 
- - c^n, - + c^n 



Dt 



n /— n /— 
- — cy/n, y + cyn 

n n 
"3/ -'^'-3/+'^. 



hence, 



D = DtX Dz- 
Given (t, z) €z D such that (T, Z) = (t, z) we have 

Ni^i{t, z) + Ni{t, z) + Ni+i{t, z) = t. 
The probability for a realization of Ni^i, Ni and Ni^i is given by: 

FiT = t,Z = z,R = r) - ,'N,Mt,z)+mt,z) + N,,,it,z)\fl^^ 



Ni^i{t,z) Ni{t,z) Ni+i{t,z) 

^ fl 

{Ni_i{t,z)y. {Ni{t,z))l {Ni+i{t,z))l \3 



(2.16) 



P(5xi > r) 



P{Bxi > r) 



(2.17) 



where the probability P{Bxi > r) = P{R = r) is due to the convention of R described in the 
example 1 2. 1[ Finally, due to |2.10[ for any ni, 77-2, 713 G N the conditional joint distribution 

P{Ni.i{T,Z,R) = ni,Ni{T,Z,R) = n2,Ni+iiT,Z,R)=n3\R = r) 

is multinomial. 
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Lemma 2.4 There exists kQ > not depending on n (hut depending on c) such that for every 
(t, z) £ D and r < / + 1 for which the probability P{{T, Z, R) = {t, z, r)) ^ 0, we have that: 



P((r,Z,i?) = (t,z,r))>^ 

n 



for every n large enough. 



Note that for any variables X and Y we have (see for example [8]) 

VAR[y] = E[VAR[y|X]] + VAR[E[y|X]] > E[VAR[y|X]]. (2.18) 

Let O be the random variable which is equal to one when (T, Z) is in D and otherwise. 
We can now use inequalities 2.15| and |2.18| to find 

VAR[L„] > E[VAR[L„,|0]] > VAR[L„,|0 = 1] • P(0 = 1) > 0.9VAR[L„|O = 1] (2.19) 

Next for every (t, z) in D and r < / + 1 we are going to simulate the random variable L„ 
conditional on {T,Z,R) = {t,z,r). We denote the result by Ln{t,z,r). In other words, the 
distribution of L„(t, z) is equal to 

C{Ln{t,z,r)) = CiLn\{T, Z,R) = it,z,r)). 

Let {Td^Zo) denote a variable having the distribution of {T,Z) conditional on the event 
{T,Z) £ D. We assume that all the Ln{t,z,r) are independent of (To^Zd). Then, we get 
that 

LniTn, Zd, R) 

has same distribution as L„, conditional on (T, Z) G D. Hence, we get 

VAR[L„|0 = 1] = VAR[L„(Td, Zd^R)] (2.20) 



By using 2.18, we find 

VAR[L„(Ti5, Zd, R)]] > E[ VAR[L„(r^, Zd, R)\Td,R] ]• (2.21) 

Note that for L„ {Td , Zd , R) to have the same distribution as Ln conditional on (T, Z) £ D 
and on R = r, the variables L„(t, z, r) do not need to be independent of each other. We are 
next going to explain how we simulate the variables Ln{t, z, r) a bit more in detail as before. 
We simulate a string ^.r) having the distribution of the string X conditional on the event 
(T, Z, R) = {t, z, r). Then we put 

Ln{t,z,r) = |LCS(X(t,,,,),y)|. 

Next, let us describe how we simulate ^.r) based on what was roughly explained at the 
beginning of subsection 2.1 Given to £ Dt the most left element in Dt and rg < I — 1, 
we are going to simulate .^ j.^) for z G Dz only if P((T, Z, i?) = (to,2,?"o)) 7^ 0. We 
simulate X(jj, 2o,ro) so that it has distribution >C(X|(r, Z, iZ) = (to, -^O; ^o) )• Next, we simulate 
X(^tQ^zo+2,ro) by choosing in X, with the same probability, a block of length / — 1 either a block 
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of length I + 1 and change its length to I. The next realization we simulate is ^(4Q,zo+4,ro) by 
choosing in X, with the same probability, a block of length / — 1 and a block of length / + 1 
and change their lengths to / (this is our usual random modification). Then by induction we 
simulate 

{^{to,ZQ+4:i,ro) : i = 1, 2, . . . } 

with our usual random modification and later 

{-'^(io,^o+2+4i,ro) : ^ = 1, 2, . . . } 

just starting with ^0+2 ro) ^^^d performing our usual random modification to get each 
,2o+6,ro)' -'^(io.zo+io.ro)) -'^(t(),2o+i4,ro)' ^tc. Both inductions run untill indexes iq, resp. «q, 
satisfying: 

20 + 4^0 < ~^+'^^ 

Tl 

zo + 2 + Ai*Q < -^i+cVn 

For simplicity, let us call zq, zi = zq + 2, Z2 = zo + 4:, . . . , Zd all the values which Z takes. After 
we have simulated X^^jj^^q^^q-), X^^^^ ^i.ro)' • • • j -^{to,za,ro) fix ti = to + 1 and repeat all the 
procedure again starting with the simulation of X^f^^^^^j-o)- We keep taking ^2 < ^3 < ^4 • • • all 
natural numbers in Dt to finish all the simulation of ^.^^ : t G Dt, z = zq, zi, . . . , z^}- 

Once we have finished with that, we take ri < / — 1 natural number and do all the simulation 
above starting with j,^) only if P((to, -zq, ri)) ^ 0. Finally, we obtain the complete 

sequence ^ : t G Dt^ z = zq, zi, . . . , Zd , r = 0, . . . ,1 — 2}, where each {t, z, r) has 
probability P((r, Z, R) = {t, z, r)) / 0. 

We need to verify that this operation give us the equiprobable distribution. This is the 
content of the next lemma: 

Lemma 2.5 Assume that is distributed according to 

C{X\{T, Z,R) = {t,z,r)). 

Choose at random (with equal probability) in the string a block of length I + 1 and I — 1 

and modify them to have both length I. Then the resulting string has distribution 

C{X\{T, Z,R) = {t,z + 4,r)). 



=^ io < \pn 




Proof. Because of our linear equation system |2.10 we have that conditioning on T, Z, R 
is equivalent to conditioning on (Ni^i, Ni, Ni^i). As mentioned, X^^t^^^j.) denotes a string of 
length n, having the distribution of X conditional on (T,Z,R) = (t,z,r). We denote by 
^(t,z,r) the string we obtain by performing our random modification on X^^ ^^). In other 
words, _2,r) is obtained by choosing a block of length / + 1 and a block of length / — 1 at 
random in X(j_^ j,) and changing them both to length /. Let (ni,n2,n3) be the number of 
blocks of length I — 1, I and / + 1 corresponding to {t,z,r). In other words, ni, n2 and na 
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are given by the linear system of equation 2.10 when Ni_i = ni,Ni = n2,Ni^i = 713 and 
T = t, Z = z, R = r. We have 



P(iVi = m, N2 = n2, Ns = nslT = t, Z = z, R = r) = 1. 

The distribution of the random string X(j ^.r) is the uniform distribution on ^"(t, z, r). Here, 
^"(t, z, r) denotes the set of strings of length n, which consists only of blocks of length I — 1, 
I and ^ + 1, such that the total number of blocks is t, whilst the number of blocks of length 
/ minus the number of blocks of length / — 1 and I + 1 \s z. We also request that the rest 
block at the end has length r. We can describe ^"(t, -z, r) equivalently as the set of all strings 
consisting exactly of ni blocks of length / — 1, n2 blocks of length I and blocks of length 
/ + 1, no other blocks allowed except a rest block at the end which has length strictly less 
than I — 1. In other words, the random string ^.r) is such that the number of blocks of 
length / — 1, / and ? + 1 is determined, only the order in which these blocks appear varies. 
Among others, each possible realization for which has non-zero probability has the 

same probability: 

'n, + n2 + n,Y' (2.22) 



m 722 ns 

When we apply the random modification, the variable T stays the same, the variable Z in- 
creases by 4 and the variable R stays the same. 

Since the distribution of X conditional on (T, Z, R) is the uniform distribution on the ap- 
propriate set of strings, we have the following: for proving that ^.r) has distribution of X 
conditional on (T, Z, R) = {t,z + 4, r) it is enough to show that its distribution is the uniform 
distribution on ^"(t, z + A,r). For this, let x denote a (non-random) element of ^"(t, z + 4, r). 
Hence, the number of blocks in x of length / — 1, Z, resp / + 1 is ni — 1, n2 + 2, resp. ns — 1. 
The probability 

can be calculated as follows: if we only know x, any block of length / of x could be the block 
which had lenght / — 1 and has been turned into length / by the tilde operation (choosing blocks 
at random and changing their lenghts). Same thing for the block which had length / + 1. But 
when we know these two blocks, then the string before the random modification is uniquely 
determined. Let x be such a string which could lead to x after the random modification. 
There are hence n2 ■ {n2 — 1) such strings (here, h2 = n2 + 2, so that h2 denotes the number 
of blocks of length / in x). The probability, given ^ = x, that the random string turns 
out to be X is equal to l/{ni ■ 77,3). As a matter of fact, among the ni blocks of lenght / — 1, 
there is exactly one which needs to be randomly modified. Similarly, among the ns blocks of 
lenght I + 1, there is exactly one which needs to be changed into length / in order to obtain 
the string x. Hence, 

P(%,,,) = S:\X(^t,z,r) =x) = (2.23) 

Let denote the set of all strings which could lead to x if we apply the random modification 
to them. We saw that there are (n2 + 2)(n2 + 1) elements in the set ^"*. By law of total 
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probability, we have 

J^n, ^te* f^i • V "1 ^2 ns y 

(2.24) 

The last equation above was obtained using |2.23 and |2.22[ Note that the sum on the most 
right of equation in 2.24 is a sum of (n2 + 2)(n2 + 1) equal terms. This leads to 

T^/i^ ~N (n2 + 2)(n2 + 1) /ni + 712 + nsX 

^{Xit,z,r) =X) = - 



ni • 723 \ ni 712 712, 

The formula on the right side above does not depend on x. Hence, this proves that ,,•) 
has the uniform distribution on the set of strings ^"(t, z + 4, r). But the uniform distribution 
is the distribution of X conditional on (T, Z, R) = (t, z + 4, r). That is, we have proven that 

C{X^t,,^,)) = C{X\{T,Z,R) = (t,z + 4,r)), 

which finishes this proof. ■ 

Note that we have seen what happens with the variables T, Z, R after our random modifica- 
tion, let us see what happens with the length of the LCS after our random modification. In 
what follows, we always consider a triplet of values (t, z, r) such that P((r, Z, R) = (t, z, r)) ^ 
0. For any e > let U^^{e) denote the event that the map 

Dz^n : Lnit,z,r) 

is increasing with a slope of at least e/8 on a scale C2 ln(?i) where C2 > is a large constant 
not depending on n. More precisely, t/"r(^) event that for any zi,Z2 in Dz, with 

Z2 — zi > C2 ln(n) we have 



Ln{t, Z2, r) - Ln{t, zi,r) > {z2 - zi)e/ 



2.1 



holds. Hence 



The event Up^{e) has large probability because we assumed that inequality 
z I— 7- Ln{t,z,r) can be viewed somehow as behaving like a random walk with drift e. In the 
next lemma we will show this looking at the event f7"(e): 

teDr, r<l+l 



Lemma 2.6 Given e > 0, take a from inequality 2.1 (theorem 1.2) and C2 to he big enough 
but not depending on n, for example C2 > ^ depending on e. Then, there exists a constant 
> not depending on n but on a and on C2 such that: 



P(C/"'(e)) < 



(2.25) 



for n large enough, provided 2.1 holds. 
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Proof. We are going to define an event V({e) for any e > 0. Let h((^t,z,r){^) be tfie event that 
the expected conditional increase is larger than e when we introduce the random change into 
X(^i^z,r)- More precisely, let hl'^^^^,^{e) be the event that 

E[ L„(t, z + 4, r) - Ln{t, z, r)|X(4,,,,), y ] > e (2.26) 

Let 

{t,z)eD, r<l+l 

hence 

(t,z)eD, r<l+l 



Note that inequality 2.1 provides a bound for the probability that the conditional expected 
increase of LCS due to our random modification not being larger or equal to e. That prob- 
ability bound is exp(— n°). The only problem is that the bound is for X and Y whilst the 
event lA^^ ^ ^^^{e) is for and Y . By going on to conditional probability we must multiply 

the probability by P((T, R) = {t, z, r)). Hence we find 



exp(— 71°^) 
P((T, Z,R) = {t,z,r))- 



We can next use the lower bound on P((T, Z, i?) = {t,z,r)) provided by lemma 2.4 for all 
values (t, z) €z D and r < / + 1 to inequality 2.28 and obtain 

P(W(-_,)(e))<^-n-exp(-n"). (2.29) 



which still gives an exponentially small bound in n. Applying now 2.29 to inequality 2.27 
we obtain 

P(^"'(e)) < ^ • • exp(-n"). (2.30) 
ko 

Which is an exponentially small bound in n. Note that when the event U^^e) holds, we 
have that z i— )■ Ln{t,z,r) behaves like a random walk with drift e. Let us formalize this. As 
before, let {zq, zi, Z2, ■ ■ ■ , Zd} be the set for the admissible values of Z. For fixed t £ Dt and 
r < Z + 1, we are going to define (t, z) inductively for z G {zq, zi, Z2, ■ ■ ■ , Zd}- Let us define 
Ll{t,z,r) := Ln{t,z,r) for every z G {zq, zi,Z2, . . . ,Zd}. Given z e {zq, zi,Z2, . . . , - 4} let 
us define (t, 5 + 4, r) as follows: 



Ll{t,z + 4,r) 



Ln{t, z + 4, r) if ,,)(e) hold for all s G {zq, zi,... ,z] 
L*^{t, z,r) + e otherwise 



Note that when the event hl^{e) holds, then Ln{t,z,r) and L^{t,z,r) are identical for all 
t G Dt, r < I + 1 and z G {zq, zi, . . . , Zd}- Let V"r(e) be the event that the map 

Dz^n : z^LUt,z,r) 

is increasing with a slope of at least e/8 on a scale C2 Inn. 
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Let V"(e) be the event 

V"(6) := n 

teDr, r<l+l 

Hence by using proposition |2.2| we have that: 



teDr, r<l+l teDx, r<l+l 



where r 



32 • When U"{e) holds then V"(e) and f/"(e) are equivalent. Hence 

ZY"(e)n ^^(e) C C/"(e) 



Hence by using [2730| and [2^31] we get: 

P(f/"'(e)) < P(^^''"(e)) + P(V'^'=(e)) < • • exp(-n") + 4kn°-5-^ 



(2.31) 



(2.32) 



To show that the last inequality gives us a rate of convergence to zero as a constant divided 
by a polynomial in n, we try now to get a closed form for the inequality supposing extra 
information for the involved constants. 



Taking C2 > ^ we have the following bound for the exponent: 



0.5 -r < -2 



therefore we can bound 



Also, we have that: 



4:1c n 



0.5-T 



< 



4/c 



n 



exp (n ") < 



holds for n large enough. So, by using 2.33| and |2.34 in 2.32 we can finally bound: 



P(f/"'(e)) < PilC'^ie)) + P(V'^"(e)) < 4lc^C2n^ • exp(-n°) + Alcn 



,0.5-r 



< {4lc^C2 + 4lc) 



1 



(2.33) 
(2.34) 



for n large enough, which ends the proof with = Alc^ko + 4/c. ■ 

Proposition 2.2 Given e > 0, let V"r(e) denote the event that the map z i— )• L*{t,z,r) is 
increasing with a slope at least e/8 on a scale C2 ln(n). Given t £ Dt, r < /+! and zi,Z2 G Dz 
such that Z2 — zi > C2 ln(n) we have the following inequality: 



p {v?M < 2 



n 



where r = 
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Proof. Let zi,Z2 G Dz such that zi < Z2- In order to simpHfy the notation, let us assume 
that Z2 — z\ can be dived by 4 and denote ^^^^ = m G N. Let zq be the most left point of 
Dz- Given e > 0, let us remember that V"r(e) is the event such that the following inequality 
holds: 

L*{t,Z2,r)-L*{t,zi,r) > ^. 

o 

Now let us define the filtration ^Jo C C • • • C dm as follows: 

for i = 1, . . . ,m. Let us denote 

a = E[Ll{t, zi + 4(i + 1), r) - L;(t, zi + 4i, r) | ] 

and define a martingale Mq , Mi , . . . , Mm with respect to the filtration C ^Ji C • • • C 
as follows: 



Mo := L;(t,zi,r) 
Mi+i-Mi := Ll{t,zi+A{i + l),r)-Ll{t,zi + Ai,r)-ei 

for i = 1, . . . , m. By definition of the map z i— )• -L* (t, r) we have an expected increase of at 
least e every time z gets increased by 4, so that the expected increase of 

E[Ll{t, zi + 4(i + 1), r) - zi + 4i, r) ] 

is at least e which implies that the following inequality 

a > e (2.35) 

is satisfied almost surely for every = l,...,m. We can write the increase of the map 
z I—)- L'^{t, z) in terms of the martingale Mq, . . . , Mm in the following way: 



TO — 1 



L* {t, Z2,r)- L* {t, zi,r) = Mm-Mo+^ei 

1=0 

Now, we are ready to estimate the probability of Vf,^(e): 

P(V,-(6)) = p(^L*it,Z2,r)-L*it,z,,r)<'-iz2-z,)) 

(m-l \ 
Mm-Mo+Y^^ei< ^{Z2 - zi)j 

/ m-l \ 

= PI Mm- Mo <^{Z2-Zi)- 



(2.36) 



i=0 



(by |2.35| and Z2 - zi = 4m) < P ( Mm - Mq < -(z2 - zi) - -{z2 - zi) 

8 4 



P ( M„ - Mn < 



< -^iz2 - zi)) 



(2.37) 
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At this point we want to use Azuma-Hoeffding inequality [L3j For this, we note that for every 
i = 1, . . . ,m we have 

P(|Mi+i-Mi|<l) = l 
since e < 1 and we take v = |(z2 — ^i) for writing down: 

(2 
2m 

(by using Z2 - zi = 4m) = 2exp f -|^(z2 - zi) ) (2.38) 



Combining together 2.37 and 2.38 we finally have: 



P(V;:,^(6))<2exp(-|^(z2-^i)) 
from where, after taking Z2 — zi < C2ln(n), we have: 

P {V^^ie)) < 2exp (^-^ln(n)) = 2n-4? 
which finishes the proof ■ 

Note that by law of total probability E[ VAR[L„(Td, Z^, R)\T£), i?] ] is equal to : 
P(C/"(e))E[ VAR[L„(rz5, Zd, R)\Td,R] |C/"(e)]+P(C/"^(e))E[ YAR[Ln{TD, Zd, R)\Td,R] |C/"^( 
for every e > and hence: 

E[YAR[Ln{TD,ZD,R)\TD,R]]>P{U^{e))E[YAR[Ln{TD,ZD,R)\TD,R] |^7"(e)] (2.39) 
Now, conditional on the event f7"(e) holding, we have that the random map: 

Dz^^ ■■ zh^Ln{t,z,r) 



has a slope of at least e/8 on a scale of C2 ln(n) (as in proposition 2.2) for any t G Dx and 
r < Z + 1, then: 

22 - zi > C2 ln(n) =^ Ln{t,Z2,r) - Ln{t,zi,r) < ^ {z2 - zi) 

o 

Z2 - zi < C2 ln(n) =^ Lnit,Z2,r) - Lnit,zi,r) <2{z2- zi) 
hold. Hence, conditional on t/"(e), we can apply lemma [2^ and obtain: 

VAR[L^{t,ZD,R)\TD ^t,R^ r,U-{e)] > ^ ( 1 - l& J'i^, ^ J VAR[Zz5|Tz, = t, i? 

04 y e^VAR[Z_D|rD =t,R = r\J 

(2.40) 

The next results give us an uniform bound for YAR[Zi:)\Td = t, R = r] for all t £ Dt- 
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Lemma 2.7 There exists a constant K > not depending on n such that: 

^_JL< P{ZD = z + A\TD = t,R = r) ^^^K^ 
\fn ~ P{Z£) = z\Td = t, R = r) ~ ^Jn 

for every (t, z) £ D , r < I + 1 and n large enough. 

Lemma 2.8 There exists a constant C > not depending on n such that: 

YAR[Zd\Td = t,R = r]>C-n 
for every t G Dt, r < I + 1 and for every n large enough. 



(2.41) 



Using the bound in lemma 2.8 we get the fohowing inequaUty: 
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(e/8 + 2)c2 ln(n) 
ey^YARlZolTD = t,R = r] 



> I 1 - 16M±Sf5 . I > 0.5 



(2.42) 



for n large enough. Using inequality 2.42 above with inequality 2.40 we find: 

YAR[Ln{t,ZD,R)\TD = t,R = r,U''{e)] > — 0.5 ■ YAR[ZD\Tn = t, R = r]. 

o4 



Using again lemma 2.8 we find that the left side of the above inequality is larger than j§gn 
and hence: 



E[YAR[Ln{TD,ZD,R)\TD,R] \U^{e)]> 

izo 



(2.43) 



We can now combine inequalities 2.19[ 2.20[ |2.21[ |2.39| and |2.43 to obtain: 



VAR[L„]>P(C/"(6))^n 



and plugging in the lower bound for R{U^{e)) obtained in 2.25 (lemma 2.6) we get: 



\AR\Ln]> ^n(l ^* 



1000 



with A;=K > is the constant from lemma 2.6 This expression is a lower bound of order G(n) 



for VAR[L„]. Hence, we have finished proving the statement of the result in theorem 1.2 
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